Search CORE

27 research outputs found

Performance Optimization on big.LITTLE Architectures:A Memory-latency Aware Approach

Author: Bolchini Cristiana
Henning John L.
Pallipadi Venkatesh
Reddy Basireddy Karunakar
Sozzo E. Del
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/06/2020
Field of study

The energy demands of modern mobile devices have driven a trend towards heterogeneous multi-core systems which include various types of core tuned for performance or energy efficiency, offering a rich optimization space for software. On such systems, data coherency between cores is automatically ensured by an interconnect between processors. On some chip designs the performance of this interconnect, and by extension of the entire CPU cluster, is highly dependent on the software's memory access characteristics and on the set of frequencies of each CPU core. Existing frequency scaling mechanisms in operating systems use a simple load-based heuristic to tune CPU frequencies, and so fail to achieve a holistically good configuration across such diverse clusters. We propose a new adaptive governor to solve this problem, which uses a simple trained hardware model of cache interconnect characteristics, along with real-time hardware monitors, to continually adjust core frequencies to maximize system performance. We evaluate our governor on the Exynos5422 SoC, as used in the Samsung Galaxy S5, across a range of standard benchmarks. This shows that our approach achieves a speedup of up to 40%, and a 70% energy saving, including a 30% speedup in common mobile applications such as video decoding and web browsing

Crossref

Lancaster E-Prints

Learning-based run-time power and energy management of multi/many-core systems: current and future trends

Author: Al-Hashimi Bashir
Basireddy Karunakar Reddy
Leech Charles
Merrett Geoff V
Singh Amit Kumar
Publication venue: 'American Scientific Publishers'
Publication date: 01/09/2017
Field of study

Multi/Many-core systems are prevalent in several application domains targeting different scales of computing such as embedded and cloud computing. These systems are able to fulfil the everincreasing performance requirements by exploiting their parallel processing capabilities. However, effective power/energy management is required during system operations due to several reasons such as to increase the operational time of battery operated systems, reduce the energy cost of datacenters, and improve thermal efficiency and reliability. This article provides an extensive survey of learning-based run-time power/energy management approaches. The survey includes a taxonomy of the learning-based approaches. These approaches perform design-time and/or run-time power/energy management by employing some learning principles such as reinforcement learning. The survey also highlights the trends followed by the learning-based run-time power management approaches, their upcoming trends and open research challenges

University of Essex Research Repository

Southampton (e-Prints Soton)

Crossref

Energy efficient run-time mapping and thread partitioning of concurrent OpenCL applications on CPU-GPU MPSoCs

Author: Basireddy Karunakar Reddy
Greenhalgh Peter
Grewe Dominik
Grewe Dominik
Pourmohseni Behnaz
Singh Amit Kumar
Wen Yuan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/09/2017
Field of study

Heterogeneous Multi-Processor Systems-on-Chips (MPSoCs) containing CPU and GPU cores are typically required to execute applications concurrently. However, as will be shown in this paper, existing approaches are not well suited for concurrent applications as they are developed either by considering only a single application or they do not exploit both CPU and GPU cores at the same time. In this paper, we propose an energy-efficient run-time mapping and thread partitioning approach for executing concurrent OpenCL applications on both GPU and GPU cores while satisfying performance requirements. Depending upon the performance requirements, for each concurrently executing application, the mapping process finds the appropriate number of CPU cores and operating frequencies of CPU and GPU cores, and the partitioning process identifies an efficient partitioning of the applications’ threads between CPU and GPU cores. We validate the proposed approach experimentally on the Odroid-XU3 hardware platform with various mixes of applications from the Polybench benchmark suite. Additionally, a case-study is performed with a real-world application SLAMBench. Results show an average energy saving of 32% compared to existing approaches while still satisfying the performance requirements

University of Essex Research Repository

Southampton (e-Prints Soton)

Crossref

Компьютерное сопровождение учебного процесса

Author: Basireddy Karunakar Reddy
Dey Somdip
Guajardo Enrique Zaragoza
McDonald-Maier Klaus
Singh Amit Kumar
Wang Xiaohang
Publication venue: Российский государственный профессионально-педагогический университет
Publication date: 01/01/2002
Field of study

Thermal cycling as well as temperature gradient in time and space affects the lifetime reliability and performance of heterogeneous multiprocessor systems-on-chips (MPSoCs). Conventional temperature management techniques are not intelligent enough to cater for performance, energy efficiency as well as operating temperature of the system. In this paper we propose a light-weight novel thermal management mechanism in the form of intelligent software agent, which monitors and regulates the operating temperature of the CPU cores to improve reliability of the system. We validated our methodology on the Odroid-XU4 SoC and it has been successful to reduce the operating temperature by 6.32% while improving performance by 7.96% and reducing power consumption by 9.45% than the state-of-the-art.</p

University of Essex Research Repository

Southampton (e-Prints Soton)

Crossref

Institutional repository of Russian State Vocational Pedagogical University

York St John University Institutional Repository

Dynamic Energy and Thermal Management of Multi-Core Mobile Platforms: A Survey

Author: Al-Hashimi Bashir M
Basireddy Karunakar Reddy
Dey Somdip
McDonald-Maier Klaus
Merrett Geoff V
Singh Amit Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/03/2020
Field of study

Multi-core mobile platforms are on rise as they enable efficient parallel processing to meet ever-increasing performance requirements. However, since these platforms need to cater for increasingly dynamic workloads, efficient dynamic resource management is desired mainly to enhance the energy and thermal efficiency for better user experience with increased operational time and lifetime of mobile devices. This article provides a survey of dynamic energy and thermal management approaches for multi-core mobile platforms. These approaches do either proactive or reactive management. The upcoming trends and open challenges are also discussed

University of Essex Research Repository

Southampton (e-Prints Soton)

The University of Manchester - Institutional Repository

King's Research Portal

York St John University Institutional Repository

Predictive Thermal Management for Energy-Efficient Execution of Concurrent Applications on Heterogeneous Multicores

Author: Al-Hashimi Bashir M
Basireddy Karunakar Reddy
de Bellefroid Cedric
Merrett Geoff
Singh Amit Kumar
Wachter Eduardo Weber
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2019
Field of study

Current multicore platforms contain different types of cores, organized in clusters (e.g., ARM's big.LITTLE). These platforms deal with concurrently executing applications, having varying workload profiles and performance requirements. Runtime management is imperative for adapting to such performance requirements and workload variabilities and to increase energy and temperature efficiency. Temperature has also become a critical parameter since it affects reliability, power consumption, and performance and, hence, must be managed. This paper proposes an accurate temperature prediction scheme coupled with a runtime energy management approach to proactively avoid exceeding temperature thresholds while maintaining performance targets. Experiments show up to 20% energy savings while maintaining high-temperature averages and peaks below the threshold. Compared with state-of-the-art temperature predictors, this paper predicts 35% faster and reduces the mean absolute error from 3.25 to 1.15 °C for the evaluated applications' scenarios

University of Essex Research Repository

Southampton (e-Prints Soton)

King's Research Portal

Runtime energy management of concurrent applications for multi-core platforms

Author: Basireddy Karunakar Reddy
Publication venue: 'University of Southampton'
Publication date: 01/04/2019
Field of study

Multi-core platforms are employing a greater number of heterogeneous cores and resource configurations to achieve energy-efficiency and high performance. These platforms often execute applications with different performance constraints concurrently, which contend for resources simultaneously, thereby generating varying workload and resources demands over time. There is a little reported work on runtime energy management of concurrent execution, focusing mostly on homogeneous multi-cores and limited application scenarios. This thesis considers both homogeneous and heterogeneous multi-cores and broadens application scenarios. The following contributions are made in this thesis. Firstly, this thesis presents online Dynamic Voltage and Frequency Scaling (DVFS) techniques for concurrent execution of single-threaded and multi-threaded applications on homogeneous multi-cores. This includes an experimental analysis and deriving metrics for efficient online workload classification. The DVFS level is proactively set through predicted workload, measured through Memory Reads Per Instruction. The analysis also considers thread synchronisation overheads, and underlying memory and DVFS architectures. Average energy savings of up to 60% are observed when evaluated on three different hardware platforms (Odroid-XU3, Intel Xeon E5-2630, and Xeon Phi 7620P). Next, an energy efficient static mapping and DVFS approach is proposed for heterogeneous multi-core CPUs. This approach simultaneously exploits different types of cores for each application in a concurrent execution scenario. It first selects performance meeting mapping (no. of cores and type) for each application having minimum energy consumption using offline results. Then online DVFS is applied to adapt to workload and performance variations. Compared to recent techniques, the proposed approach has an average of 33% lower energy consumption when validated on the Odroid-XU3. To eliminate dependency on the offline application profiling and to adapt to dynamic application arrival/completion, an adaptive mapping approach coupled with DVFS is presented. This is achieved through an accurate performance model, and an energy efficient resource selection technique and a resource manager. Experimental evaluation on the Odroid-XU3 shows an improvement of up to 28% in energy efficiency and 7.9% better prediction accuracy by performance models.<br/

Southampton (e-Prints Soton)

Online concurrent workload classification for multi-core energy management

Author: Al-Hashimi Bashir
Basireddy Karunakar Reddy
Merrett Geoff
Singh Amit
Publication venue
Publication date: 01/03/2018
Field of study

Modern embedded multi-core processors are organized as clusters of cores, where all cores in each cluster operate at a common Voltage-frequency (V-f ). Such processors often need to execute applications concurrently, exhibiting varying and mixed workloads (e.g. compute- and memory-intensive) depending on the instruction mix and resource sharing. Runtime adaptation is key to achieving energy savings without trading-off application performance with such workload variabilities. In this paper, we propose an online energy management technique that performs concurrent workload classification using the metric Memory Reads Per Instruction (MRPI) and pro-actively selects an appropriate V-f setting through workload prediction. Subsequently, it monitors the workload prediction error and performance loss, quantified by Instructions Per Second (IPS) at runtime and adjusts the chosen V-f to compensate. We validate the proposed technique on an Odroid-XU3 with various combinations of benchmark applications. Results show an improvement in energy efficiency of up to 69% compared to existing approaches

Southampton (e-Prints Soton)

Dataset supporting the article entitled "Online Concurrent Workload Classification for Multi-core Energy Management"

Author: Al-Hashimi Bashir
Basireddy Karunakar Reddy
Merrett Geoff
Singh Amit
Publication venue: University of Southampton
Publication date
Field of study

This dataset supports the article entitled "Online Concurrent Workload Classification for Multi-core Energy Management" accepted for publication in ACM/IEEE Design Automation and Test in Europe (DATE), 2017.</span

Southampton (e-Prints Soton)

Memory and thread synchronization contention-aware DVFS for HPC systems

Author: Al-Hashimi Bashir
Basireddy Karunakar Reddy
Merrett Geoff
Weber Wachter Eduardo
Publication venue
Publication date: 01/06/2018
Field of study

Due to the operating costs and failure rates of computing platforms, energy efficiency has become a major concern for modern and future many-core systems. In the quest for high performance, the power consumption growth rate must slow down while delivering more performance per unit of power. To improve the energy efficiency of such systems, processors are equipped with low-power techniques such as dynamic voltage and frequency scaling (DVFS) and power capping. These techniques must be controlled carefully as per the workload; otherwise, it may result in significant performance loss and/or power consumption due to system overheads (e.g. DVFS transition latency). Existing approaches [1], [2] are not effective in adapting to workload variations as they do not consider the combined effect of application compute-/memory-intensity, thread synchronization contention, and non-uniform memory accesses (NUMAs) owing to the underlying processor architecture. This poster discusses a workload-aware runtime energy management technique that takes the aforementioned factors into account for efficient V-f control

Southampton (e-Prints Soton)